How I learned to stop worrying and love Cosmos DB’s Request Units

6 min readJul 27, 2017

This post is a reboot of my initial article about Request Unit provisioning on DocumentDB. It has been updated to reflect recent changes like the evolution of DocumentDB into Cosmos DB and new monitoring capabilities.

As a database-as-a-service platform, Cosmos DB offers a rather unique advantage: predictable performance. You specify the level of throughput you expect and the database guarantees that it will meet that level by dedicating the required resources to your usage.

Request Units are Cosmos DB’s performance currency

You define that level of performance by provisioning, for each container of your database, an amount of Request Units; more precisely, you set how many Request Units you expect the container to be able to serve per second. Provisioned Request Units can start low and scale to tens of thousands or even more.

So, Request Units being the “performance currency” in Cosmos DB, you must be curious to know what they represent… and that’s where it gets tricky, to say the least. They define what I would call a “work capacity”. Each request you issue against your container — any kind of request: reads, writes, queries, executions of stored procedures etc. — has a corresponding cost that will be deducted from your RU credits. So if you provision 400 RU/second and issue a query that costs 40 RU, you will be able to issue 10 such requests per second; any request beyond that will get throttled.

The art of evaluating your Request Unit needs

By now you probably guess that it’s important to properly evaluate how many RU you have to provision; provision too few and some of your requests may fail, provision too many and you will pay for unused performance!

It’s actually pretty hard to perform that kind of evaluation from scratch. Microsoft provides some metrics about the costs of basic operations:

For a 1 KB document: a read costs 1 RU, a write costs 5 RU
For a 100 KB document: a read costs 10 RU, a write costs 50 RU

And there is also a pretty nice capacity planner that can help with guessing your target throughput based on sample documents.

Unfortunately, this only covers CRUD operations and doesn’t give a single hint about how many RU your queries or stored procedure executions will cost. And that’s a big deal because from my experience, the cost of those operations:

is dynamic (the cost of a query does not only depend on its complexity but also on the number of results it returns), and
can get pretty high (tens of RU for a query that filter on a single field, for example).

In a nutshell, it’s pretty hard to predict how many RU your system will require. So is this thing usable at all? Fortunately, yes it is. Read on!

Evaluate the cost of typical queries with the query explorer

Once you’ve fed your Cosmos DB container with some relevant data — that could be test data to start with — you can use the query explorer that’s available from the Azure portal. Besides helping you build and debug your SQL queries, this tool also reports the real cost of those queries:

This will give you a sense of the actual charge involved with typical operations that your system will have to support.

Cosmos DB throttles your requests intelligently

When you’re exceeding your RU quota, Cosmos DB doesn’t reject your additional requests by just screaming ERROR! Not only does it explicitly flag these throttled requests with the HTTP status code 429, but the response also provides a very useful header: x-ms-retry-after-ms. As its name implies, this header tells you how much time you should wait before re-trying.

Although this hint has its own limits (it may not be very reliable if multiple clients overload your RU quota at the same time), it’s still a very useful information to have in order to define the cool-off period one should wait and avoid a retry policy that would be too aggressive (making things even worse!).

Microsoft’s official SDKs do the work for you

The good news is, if you’re using one of Microsoft’s official Cosmos DB SDKs (currently available for .NET/.NET Core, Java, Node.js and Python), they all implement the retry logic based on the header mentioned above — and apply it by default. And by default, they all retry the throttled requests up to 9 times, or up to 60 seconds after the request was originally issued, whichever comes first. Those parameters are configurable when you create a new DocumentClient instance.

Request throttling can be easily monitored

From the Azure portal, it is straightforward to monitor the number of throttled requests as well as how many RU you’ve consumed vs. how many you’ve provisioned.

What it looks like to exceed your provisioned throughput!

Even better, you can (and should!) set alerts for when the number of throttled requests exceeds a specific threshold:

Those alerts can dispatch an email to the account administrators or call a custom HTTP webhook.

Scale your throughput elastically and on-demand

Monitoring the consumption of your RU and the ratio of throttled requests will probably reveal that you don’t need to keep a constant performance level throughout the day or the week; many web applications receive less traffic at night or during the week-end.

Cosmos DB’s REST API provides endpoints to programmatically update the performance level of your containers (those endpoints are also exposed through the official SDKs), making it straightforward to adjust the throughput from your code depending on the time of the day or the day of the week. The operation is performed without any downtime, and typically takes effect in less than a minute.

Cosmos DB doesn’t offer performance auto-scaling (yet — I wouldn’t be surprised if that was in the works) but it’s totally possible to implement something similar: you could gather live metrics either from your application code or using the Azure monitoring webhooks, aggregate that data in Azure Stream Analytics and use the output data in some Azure Function to perform real-time adjustment of your throughput provisioning. Well, I think I’ve just found a nice topic for my next blog post!

The path to optimum Cosmos DB usage

The workflow that I usually recommend to configure Request Units on a new project is:

Perform an initial, rough evaluation using the capacity planner and adjust your estimate with the help of the query explorer
If possible, use one of the official Cosmos DB SDKs to benefit from automatic retries when requests get throttled — if you’re working on a platform that is not supported and use Cosmos DB’s REST API, implement your own retry policy using the x-ms-retry-after-ms header
Obviously, make sure that your application code gracefully supports the case when all retries fail
Configure throttling alerts from the Azure portal — start with conservative limits like 10 throttled requests over the last 15 minutes and switch to more permissive rules once you figure out your actual consumption (remember that occasional throttles are fine, they show that you’re playing with the limits you’ve set and that’s exactly what you want to do)
Use monitoring to understand your traffic pattern, so you can consider the need to dynamically adjust your throughput provisioning over the day/week
Monitor regularly your provisioned vs. consumed RU ratio to make sure you have not over-provisioned your containers

I hope that following these steps will help you to use Cosmos DB efficiently and confidently!